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1.  INTRODUCTION 

This  report  documents  the  second  year  of  work  on  the  Program  Refer¬ 
ence  Language  project  (PRL),  which  is  a  basic  research  effort  aimed  at 
the  creation  of  a  mechanism  for  flexibly  identifying  the  interesting 
portions  of  programs.  Although  this  work  began  as  an  investigation  into 
query  languages  which  provided  textual  and  syntactic  search  predicates, 
it  has  grown  over  the  course  of  our  research  into  a  knowledge  base  about 
programs  in  general,  and  into  a  database  that  documents  the  structures 
present  in  specific  pieces  of  code.  This  development  is  discussed  at 
length  in  the  annual  report  for  the  first  year  of  research,  and  is 
recapped  only  briefly  below  (See  "Searching  a  Knowledge  Base  of  Programs 
and  Documentation",  [Shapiro-83]  for  more  details.)  This  document 
focuses  on  a  study  of  program  comprehension  which  we  performed  in  order 
to  elicit  the  information  required  to  design  a  formal  query  language  for 
the  PRL. 

1.1  PROJECT  HISTORY  AND  ACCOMPLISHMENTS 

Our  original  proposal  defined  the  Program  Reference  Language  as  a 
tool  for  flexibly  accessing  the  interesting  portions  of  programs.  The 
project  was  a  two  year  effort  whose  goal  was  to  perform  research  aimed 
at  supporting  program  creation  and  mainterance  by  allowing  programmers 
to  locate  specific  sections  of  code  via  textual,  syntactic,  and  historic 
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search  requests  (related  to  a  stack  of  program  locations  that  had  been 
visited). 

The  PRL  was  originally  intended  as  a  focused  tool  for  accomplishing 
this  end.  However,  in  the  course  of  our  research,  we  developed  the 
hypothesis  that  the  correct  approach  to  supporting  program  search  was  to 
create  a  system  that  captured  knowledge  about  the  structure  of  code.  At 
the  simplest  level,  this  meant  being  able  to  locate  loops,  procedures, 
and  variables  via  cross-indexing  schemes.  On  a  more  sophisticated 
plane,  it  meant  creating  an  automated  understanding  of  the  structure  of 
programs  so  that  regions  of  code  could  be  located  based  on  their 
descriptions.  This,  in  turn,  required  the  definition  and  support  of  a 
vocabulary  for  referencing  code  which  was  in  tune  with  the  terminology 
that  programmers  natively  employed. 

We  took  the  task  of  defining  such  a  vocabulary  as  the  major  goal 
of  the  PRL.  Over  the  course  of  our  first  year  of  research,  it  led  to 
the  design  of  a  knowledge  base  that  captured  much  of  the  syntac¬ 
tic,  semantic  and  pragmatic  (domain  of  application)  structures 
in  programs,  as  well  as  to  the  creation  of  a  search  mechanism  which 
was  able  to  access  that  data.  During  this  work,  we  came  to  the  conclu¬ 
sion  that  the  knowledge  required  to  support  program  search  was  in 
fact  identical  to  the  information  required  to  support  a  variety  of 
intelligent  tools  for  manipulating  code.  Hence,  the  PRL  grew  into 
a  knowledge  representation  system  for  recording  facts  about  programs  in 
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general,  and  into  a  database  for  documenting  the  structures  present  in 
specific  pieces  of  code.  To  emphasize  this  fact,  we  renamed  the  entire 
system  the  Extended  Program  Model  (EPM),  and  consider  the  PRL  to  be 
one  part. 

At  the  end  of  our  first  year  of  research,  we  presented  a  concept 
feasibility  demonstration  of  a  system  which  supported  program  search 
through  a  knowledge  base  representing  a  toy  ADA  program.  Although  it 
required  some  admittedly  tedious  commands,  the  system  was  able  to  answer 
the  request,  “find  the  initializations  of  the  loop  which  computes  the 

sum  of  the  test  scores".  This  task  involved  integrating  textual  and 
syntactic  clues,  as  well  as  references  to  data  flow  information  and  pro¬ 
gramming  cliches.  The  demonstration  system  is  discussed  in  [Shapiro- 
83],  and  (together  with  the  design  of  the  EPM)  is  the  subject  of  two 

published  papers,  at  the  1983  Trends  and  Applications  Conference 

[Shapiro-83b] ,  and  the  Seventh  International  Conference  on  Software 

Engineering  [Shapiro-84] . 

To  provide  further  context  for  our  work,  we  also  explored  an  appli¬ 
cation  of  the  EPM  to  the  program  creation  process.  We  outlined  a  sys¬ 
tem,  called  the  Intelligent  Program  Editor  (IPE),  that  employed  the 
EPM's  knowledge  base  to  augment  the  capabilities  of  standard  text  edi¬ 
tors.  Using  thi6  information,  we  felt  that  the  IPE  would  be  able  to 
express  semantically  oriented  consistency  constraints,  perform  large- 
scale  editing  transformations,  and  even  provide  support  for  the 
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template-based  creation  of  programs  using  the  vocabulary  defined  by  the 
EPH.  In  January  of  1982,  the  Intelligent  Program  Editor  moved  into 
separate  development  at  AI&DS  through  a  research  grant  from  the  Office 
of  Naval  Research. 

Our  plans  for  the  third  year  of  research  have  focused  on 
transforming  the  EPM  (as  defined  above)  from  the  concept 
feasibility  phase  into  a  more  practical  piece  of  technology.  In  light 
of  that  goal,  we  undertook  the  study  reported  on  in  this  document, 
which  was,  in  essence,  an  informal  experiment  aimed  at  identifying 
the  vocabulary  and  the  procedures  programmers  use  to  search  through 
code.  We  administered  this  test  to  several  professional  program¬ 
mers  and  research  personnel  within  AI&DS.  At  the  time  of  this  writing, 
the  results  of  the  study  are  being  used  to  motivate  the  definition  (in 
terms  of  vocabulary  and  syntax)  of  the  formal  query  language  which  will 
become  the  PRL . 


1.2  RESEARCH  OBJECTIVES 


Some  of  the  key  research  issues  which  have  been  addressed  in  this 
project  are: 


1.  What  are  the  most  useful  ways  of  referring  to  parts  of  a  program? 
Said  in  a  different  way,  what  vocabulary  do  programmers  currently 
use  to  describe  portions  of  their  programs? 

2.  What  information  must  be  included  in  a  knowledge  base  about  pro- 
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grams  and  documentation  in  order  for  it  to  support  program  search? 

3.  What  information  must  be  included  in  such  a  knowledge  base  for  it 
to  support  a  variety  of  intelligent  tools  for  accessing  and  manipu¬ 
lating  code? 

4.  How  should  information  of  this  kind  be  represented? 

3.  How  should  application  specific  knowledge  be  included? 

6.  How  can  user-supplied  assertions  and  other  documentation  be 
acquired  and  integrated  into  a  knowledge  base  for  use  in  program 
referencing  and  other  tasks? 

7.  How  can  search  requests  be  expressed  in  a  uniform  reference 
language  ? 

8.  What  form  of  a  search  mechanism  is  required  to  implement  these 
reference  requests? 

9.  How  can  these  searches  be  performed  efficiently?  In  what  ways  can 
search  be  limited  or  deferred  in  order  to  maintain  good  reponse 
time? 


1.3  GUIDE  TO  READING 

The  following  chapters  provide  detail  on  the  study  of  program 
comprehension  which  was  performed.  Chapter  2  introduces  the 
specific  goals  of  the  study,  chapter  3  discusses  the  design  of  the 
informal  experiment  that  was  conducted,  chapter  4  gives  the  actual 
information  presented  to  the  experimental  subjects,  chapter  5 
describes  the  data  that  was  collected,  chapter  6  begins  the 
analysis  of  results,  and  chapter  7  describes  the  implications  of  those 
results  for  the  PRL  and  EPM  Chapter  8  provides  a  brief  summary  of 
this  material,  while  chapter  9  provides  a  discussion  of  our  future 
research  plans.  Chapter  10  contains  a  discussion  of  key  research 
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personnel  involved  with  this  project. 


Appendix  A  contains  the  listing  of  the  program  used  for  the  study, 
and  Appendix  B  is  the  questionnaire  that  was  employed. 
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2.  A  STUDY  OP  PROGRAM  COMPREHENSION 


To  achieve  its  purpose  and  to  be  accepted  as  a  useful  tool,  the  PRL 
design  would  do  well  to  pay  attention  to  the  way  programmers  currently 
perform  the  common  task  of  learning  about  an  unfamiliar  program.  Except 
where  it  can  dramatically  increase  performance  without  introducing  a 
prohibitive  learning  cost,  the  PRL  should  present  them  with  a  conceptual 
model  of  the  program  that  is  consistent  with  the  one  they  use  now.  It 
should  let  them  perform  operations  analogous  to  the  ones  they  use  men¬ 
tally,  but  it  should  speed  up  the  process  by  keeping  all  the  information 
integrated  together  and  on-line.  It  should  increase  reliability  by 
automatically  deriving  information  whenever  possible,  to  avoid 
discrepancies  between  the  program  text  and  the  support  information. 
Additional  functionality  may  be  provided,  but  it  still  should  aim  to 
stay  within  programmers'  mental  model  of  programs. 

Accordingly,  we  performed  a  study  intended  to  explore  programmers' 
mental  models  of  programs  and  their  methods  for  program  comprehension. 
In  particular,  the  focus  was  on  the  steps  taken  in  studying  an  unfami¬ 
liar  program,  and  the  vocabulary  used  to  describe  parts  of  it.  In  addi¬ 
tion,  recommendations  were  solicited  for  useful  extensions  to  the  capa¬ 
bilities  of  current  support  tools.  We  wanted  to  know  not  only  what  pro¬ 
grammers  currently  do,  but  also  what  they  wish  or  imagine  they  could  do 
if  given  the  right  tools  to  help.  This  was  not  meant  to  be  a  controlled 
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experiment;  rather,  it  was  an  exploratory  attempt  to  gather  qualitative 
data  on  programmer  performance  and  preferences. 

The  PRL  design  incorporates  different  views  of  the  program,  each  of 
which  makes  different  information  readily  apparent.  Some  proposed  views 
are  similar  to  the  stages  of  analysis  required  by  compilers  to  translate 
a  program  text  into  a  running  program.  Some  views  are  also  similar  to 
the  information  captured  in  certain  popular  types  of  documentation,  such 
as  flow  charts  and  cross  reference  listings.  Because  these  types  of 
information  are  already  well  known  programming  tools,  it  seemed  clear 
that  they  ought  to  help  program  comprehension  still  more  if  maintained 
in  an  on-line  representation  that  promotes  both  human  and  machine  pro¬ 
cessing.  One  result  of  this  study  was  to  confirm  that  programmers  are 


already  used  to  thinking  about  programs  in  these  ways,  and  have  a 
natural  vocabulary  for  describing  parts  of  programs  from  these  views. 
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3.  DESIGN  OF  THE  STUDY 


The  goal  in  this  study  was  to  observe  programmers  going  about  the 
business  of  understanding  programs.  Rather  than  hand  the  subjects  a 
Listing  and  ask  them  to  read  it,  a  debugging  task  was  chosen.  Asking 
the  subjects  to  fix  a  problem  in  some  piece  of  code  gave  them  a  common 
task  to  focus  on,  and  stronger  motivation  to  learn  about  the  program. 
It  was  also  hoped  that  the  time  needed  to  complete  the  task  would  pro¬ 
vide  some  measure  of  how  well  they  were  comprehending  the  program. 

It  was  desired  that  the  program  used  for  this  study  reflect  as  much 
as  possible  the  realities  of  code  as  it  is  found  in  the  normal  produc¬ 
tion  environment.  There  were  of  course  limits  on  the  size  of  the  pro¬ 
gram  that  subjects  could  be  expected  to  study.  A  Lisp  program  which  had 
originally  been  written  for  an  earlier  study  of  debugging  [Shapiro-81] 
was  chosen  as  the  sample  program.  It  seemed  a  good  choice  because  it 
contained  a  single  bug  that  was  somewhat  subtle,  but  still  deemed  dis¬ 
coverable.  The  program  is  about  300  lines  long,  and  its  listing  is 
approximately  seven  pages;  it  is  included  as  Appendix  A.  The  important 
features  of  this  program,  aside  from  its  language  and  length,  are  that 
it  was  new  to  all  of  the  subjects  in  the  study,  and  that  it  was 
presented  in  a  fairly  scrambled  state.  This  disorganization  was  the 
result  of  porting  the  program  across  three  different  environments  and 
was  of  value  in  the  study  because  it  truly  reflects  the  state  of  many 
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programs. 

Programmers  generally  do  not  perform  their  job  in  a  vacuum;  they 
have  some  sort  of  support  environment.  That  support  is  normally  com¬ 
posed  both  of  software  tools  and  documentation.  However,  for  the  pur¬ 
poses  of  the  study,  it  was  decided  to  eliminate  the  tools  so  that  the 
programmers'  work  on  the  program,  rather  than  their  facility  with  cer¬ 
tain  existing  tools,  could  be  observed.  Thus  even  though  the  program 
was  in  Lisp,  the  programmers  did  not  have  access  to  a  Lisp  interpreter 
or  environment. 

It  was  also  decided  to  pare  back  the  documentation  to  the  bare 
minimum,  once  again  to  reflect  common  real  world  conditions.  As  an  aid 
to  designing  this  study,  a  single  subject  was  run  in  a  pilot  trial.  The 
pilot  subject's  ignorance  turned  out  to  be  far  too  extensive.  He  was 
given  the  program  listing  and  told  to  find  the  bug.  Ke  was  given  no 
documentation  at  first:  no  information  about  the  purpose  of  the  program, 
its  inputs  and  outputs,  or  even  the  external  manifestation  of  the  bug. 
After  an  hour  of  studying  the  listing  it  became  clear  that  more  informa¬ 
tion  was  needed.  It  was  necessary  to  ask  the  program's  author  for  a 
brief  sketch  of  the  program's  purpose  and  design,  and  its  incorrect  I/O 
behavior. 

Based  on  this  experience,  it  was  decided  to  provide  a  limited 
amount  of  documentation.  A  short  packet  was  prepared,  describing  the 
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purpose  and  workings  of  the  program  followed  by  a  bug  report  with  pic¬ 
tures  of  several  successive  output  states  illustrating  the  bug.  This 
information  packet  is  shown  in  Figure  3-1.  The  subjects  were  also 
allowed  to  refer  to  the  language  manual  for  the  dialect  of  Lisp  used. 
The  program  itself  had  only  a  minimal  number  of  comments  written  by  the 
original  author  for  his  own  use.  No  information  on  the  history  of  the 
program  was  provided. 

One  interesting  observation  in  the  pilot  study  was  the  importance 
of  writing  on  the  listing.  The  subject  marked  up  the  listing  with 
highlighters,  pens,  and  pencils  in  multiple  colors.  Marginal  notes  and 
assorted  doodles  left  a  trail  indicating  which  parts  had  been  studied, 
and  were  used  to  record  discoveries  made  along  the  way.  These  markings 
were  interesting  both  as  an  indication  of  how  the  subject  attacked  the 
task,  and  in  their  own  right  as  a  set  of  features  which  might  be  worth 
providing  in  the  interface  of  an  on-line  tool.  As  a  result,  the  sub¬ 
jects  were  encouraged  to  write  on  their  listing;  the  listings  were  col¬ 
lected  at  the  end  of  each  trial  and  considered  part  of  the  collected 
data. 


Based  on  an  evaluation  of  the  data  generated  by  the  pilot  subject, 
and  a  review  of  the  PRL  project's  needs,  a  list  of  questions  was  con¬ 
structed  for  the  subjects  to  focus  on  while  performing  the  debugging 
task.  A  questionnaire  was  also  designed  for  them  to  fill  out  when  they 


finished  the  task.  This  questionnaire  also  became  the  form  around  which 
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PROCRAM  DESCRIPTION. 

The  test  program  is  a  morphogenesis  simulation,  called  PROSPER,  which 
loosely  models  the  growth  of  a  colony  of  bacteria.  In  PROSPER,  the  user 
provides  an  initial  pattern  uf  cells  and  a  collection  of  production 
rules  which  govern  their  division.  Cells  are  created  with  division 
times,  and  cancer  (C)  cells  are  expected  to  divide  more  frequently  than 
normal  (A)  cells.  The  simulation  outputs  a  trace  of  the  bacteria  colony 
through  time. 

The  default  initial  pattern  of  cells  looks  like  this: 

A 

A  C  A 
A 

Sample  productions  might  look  like  these: 


Crow  A 

AAA  — — >  AAA 

A  A 

A  Carcinoma  A 

AAA  - >  A  C  A 

A  A 

A  Ketastista  A 

A  C  A  — — >  A  C  C  A 

A  A 


THE  BUG  REPORT: 

The  program  was  started  by  calling  'T1UTER-PR0SPER ”  without  any 
arguments.  This  has  the  effect  of  starting  the  program  from  the 
default  initial  configuration  pictured  above.  The  sequence  of  output 
frames  generated  is  reproduced  belov.  The  problem  is  that  the  user 
expected  the  productions  to  cause  an  explosive  growth  of  cancer  cells 
(cells  of  type  "C"),  and  instead  the  A  cells  grew  abundantly. 


SAMPLE  OUTPUT  FROM  PROSPER: 


A 

A  C  A 

A 

A 

A  C  C  A 

A 

■ 

a 

■ 

V 

A 

A  A  C  C  A 

A 

Frame 

Frame  T 

Frame  3 

A  A 

A  A  C  C  A 

A 

i\  i\ 

A  A  C  C  A 

A  A 

•  *•> 

AAA 

A  A  C  C  A 

A  A 

Frame  A 

Frame  5 

Frame  b 

Figure  3-1:  Documentation  Packet  for  Program  Comprehension  Study 
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4.  THE  STANDARD  FORM  OF  THE  STUDY 


Based  on  the  pilot  trial,  a  basic  format  was  chosen.  However,  this 
format  was  varied  in  some  details  over  the  course  of  the  several  trials, 
in  response  to  comments  by  the  subjects  as  well  as  our  own  observations. 
This  study  was  not  in  any  sense  designed  to  be  a  controlled  experiment; 
a  controlled  situation  did  not  seem  as  important  as  one  that  reflected  a 
realistic  scenario  yet  still  allowed  the  observation  and  recording  of 
subjects'  behavior.  For  example,  when  programmers  complained  that  the 
trials  were  too  long,  the  allowed  time  was  reduced  from  an  original 
limit  of  four  hours  to  two  hours. 

A  small  documentation  packet  was  given  to  subjects  at  the  beginning 
of  the  trial.  This  was  the  only  information  the  subjects  were  given 
about  the  program  and  the  bug.  This  packet  is  shown  in  Figure  3-1.  It 
describes  the  purpose  of  the  program  and  a  little  about  its  organiza¬ 
tion:  namely,  the  program  is  like  a  mathematical  game  called  'life" 
(with  which  many  programmers  are  familiar),  and  it  is  based  on  produc¬ 
tion  rules,  which  specify  how  the  cells  reproduce  based  on  their  pattern 
of  arrangement.  The  packet  describes  the  bug  and  gives  an  example  of 
the  erroneous  I/O  behavior.  The  expected  behavior  of  the  program  was 
that  the  cancer  cells  would  reproduce  much  more  quickly  than  the  normal 
cells.  Instead,  the  normal  cells  grew,  and  the  cancer  cells  did  not. 
The  error  was  caused  by  an  improper  use  of  a  subroutine  that  inserted 
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cells  into  a  priority  queue  ordered  by  cell  division  time. 


The  instructions  for  the  study  are  shown  in  Figure  4-1.  The  basic 
task  required  the  subjects  to  concentrate  on  finding  and  fixing  the  bug 
in  the  program.  While  they  worked  at  this  task,  they  were  requested  to 
speak  out  loud  to  reveal  what  they  were  thinking.  It  was  explained  to 
the  subjects  what  sort  of  information  was  desired,  and  a  set  of  four 
questions  was  provided  to  focus  their  introspective  reports.  Those  four 
questions  are  the  central  issues  around  which  this  stuuy  is  organized. 


As  the  instructions  indicate,  data  was  collected  by  several  means. 
The  primary  information  came  from  the  subjects  themselves  as  they  talked 
about  what  they  were  doing.  Though  there  was  a  tape  recorder  running, 
we  primarily  relied  on  the  experimenter's  notes  for  data.  Occasionally 
the  experimenter  interrupted  the  subjects  to  remind  them  of  one  of  the 
central  focus  questions,  or  to  query  them  specifically  about  their 
current  line  of  attack. 


Following  completion  of  the  task,  or,  more  frequently,  when  time 
ran  out,  the  subject  was  given  a  questionnaire  to  complete.  This  ques¬ 
tionnaire  is  shown  in  Appendix  B.  It  presents  the  four  basic  questions 
for  the  study,  and  adds  a  couple  of  new  questions.  The  subjects  were 
asked  to  comment  on  the  experiment  itself,  and  in  particular  on  the 
issue  of  supporting  documentation.  Finally,  an  annotated  list  of  sample 
vocabulary  for  referring  to  programs  was  provided,  in  order  to  stimulate 
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INSTRUCTIONS  FOR  THE  EXPERIMENT: 

Your  task  is  to  find  and  correct  the  bug  in  the  program  PROSPER.  Our 
interest  is  in  the  way  in  which  you  go  about  that  task.  In  order  to 
record  your  process  of  exploration  and  understanding,  we  want  you  to 
feel  free  to  mark  up  the  listing  in  any  way  you  want.  We  will  also  have 
a  tape  recorder  running,  and  encourage  you  to  produce  a  running 
monologue  of  your  thoughts.  From  time  to  time,  the  experimenter  may  ask 
you  a  question  to  prod  you  into  revealing  what  you  are  thinking  about. 
The  experimenter  will  also  be  taking  notes  on  what  he  thinks  you  are 
doing. 

In  particular,  we  would  like  you  to  pay  attention  to  the  following  sorts 
of  issues  and  to  record  comments  on  them  when  appropriate: 

1)  What  questions  do  you  ask  about  the  program's  structure 
and  design? 

2)  What  sort  of  vocabulary  do  you  use  to  refer  to  objects 
in  the  program,  and  the  relations  between  them? 

3)  What  sort  of  hypotheses  do  you  construct,  and  how  do  you 
evaluate  them? 

4)  What  aids  for  searching  through  the  program  would  you 
like  to  have? 


Figure  4-1:  Instructions  for  Program  Comprehension  Study 
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5.  DATA  FROM  THE  STUDY 


The  study  was  run  with  five  subjects.  They  were  all  experienced 
programmers ,  fellow  employees  at  AI&DS.  They  had  varying  degrees  of 
familiarity  with  the  particular  dialect  of  Lisp  used  in  the  program. 
None  of  them  had  any  previous  knowledge  of  the  program  or  of  the  type  of 
bug  it  contained. 

Of  the  five  subjects,  only  one  actually  found  the  bug  in  the  allot¬ 
ted  time.  This  does  not  in  any  way  denote  failure,  since  actual  debug¬ 
ging  was  not  the  focus  of  our  study;  rather  the  emphasis  was  on  observ¬ 
ing  how  the  subjects  went  about  studying  the  program.  However,  it  was 
apparent  that  the  single  subject  who  found  the  bug  exhibited  behavior 
which  differed  markedly  from  the  others.  All  of  the  subjects  spent  a 
lot  of  time  hand  simulating  the  execution  of  the  code;  this  simulation 
required  extra  concentration,  and  errors  that  resulted  often  hindered 
their  efforts.  There  was  substantial  difference  in  the  depth  to  which 
subjects  followed  trains  of  subroutine  calls  on  early  passes  through  the 
code.  The  successful  subject  was  the  one  who  was  the  best  at  staying  at 
a  high  level. 
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Vocabulary  was  mainly  collected  from  the  subjects'  verbalizations, 
and  was  largely  uniform  across  subjects.  Similarly,  annotations  to  the 
listings,  when  collected  and  analyzed,  showed  relatively  consistent  pat¬ 
terns  across  subjects.  Additional  data  was  provided  by  the  question¬ 
naires  completed  by  the  subjects  at  the  end  of  each  trial. 


-19- 


Analysis  of  Results 


Section  6 


6.  ANALYSIS  OF  RESULTS 


This  section  presents  an  analysis  of  the  data  in  the  previous  sec¬ 
tion.  In  compiling  the  results  of  each  trial  we  largely  followed  the 
format  of  the  questionnaire,  however  the  analysis  of  the  subject's  mark¬ 
ings  of  the  program  listing  was  added. 


6.1  VARIATIONS  IN  PROGRAMMERS'  STYLES  OF  PROGRAM  UNDERSTANDING 

The  task  we  set  for  the  subjects  was  to  find  and  fix  a  bug  in  a 
program.  It  was  normally  assumed  that  there  was  only  one  bug,  and  that 
it  was  fairly  well  localized.  This  task  should  have  elicited  goal 
directed  understanding;  there  was  no  need  for  the  subjects  to  understand 
the  entire  program.  It  turned  out  that  these  programmers  varied  consid¬ 
erably  in  their  ability  to  focus  in  on  the  problem. 

All  the  subjects  adopted  a  strategy  of  making  a  first  pass  through 
the  program  at  a  high  level  to  get  an  overview  of  the  program'6  struc¬ 
ture.  The  plan  was  to  gain  a  general  understanding  which  would  allow 
the  construction  of  useful  hypotheses.  Generally  they  began  at  the  main 
routine  and  started  tracing  through  the  program's  execution  to  some  lim¬ 
ited  depth.  Early  on,  the  focus  was  on  the  data  structures,  and  later 
on,  the  routines  that  manipulate  them.  In  the  absence  of  more  complete 
documentation  this  is  a  necessary  information  gathering  step. 
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Familiarity  with  the  program's  basic  structure  is  intended  to  aid 
generation  of  hypotheses  about  the  bug.  The  more  specific  the 
hypothesis,  the  more  specific  the  knowledge  needed  to  form  and  then  test 
it.  In  fact  there  are  certain  very  general  classes  of  bug  hypotheses 
that  most  programmers  will  make  based  on  little  or  no  information; 
experience  indicates  that  these  types  of  mistakes  are  nearly  universal. 
Examples  include  passing  parameters  to  a  subroutine  in  the  wrong  order, 
and  (in  Lisp)  incorrectly  grouping  items  in  parentheses.  Checking  for 
these  simple  but  common  mistakes  can  be  time  consuming,  and  if  done  on 
the  first  pass,  may  defeat  the  plan  of  performing  a  quick  overview. 
This  approach  also  allows  major  features  of  the  program's  organization 
to  escape  notice  for  long  periods  of  time. 

This  is  in  fact  what  typically  happened  to  the  subjects  in  this 
study.  Drawn  on  by  the  possibility  of  finding  some  simple  error,  most 
of  the  subjects  tended  to  push  to  deeper  levels  more  quickly  than  they 
had  intended.  They  often  felt  they  might  as  well  rule  out  such  problems 
in  a  section  of  the  code  during  their  first  reading  of  it;  by  staying  in 
order,  they  were  sure  they  wouldn't  miss  anything.  Also  as  one  subject 
noted,  the  extra  study  at  these  lower  levels  could  potentially  prove 
useful  later  on. 

Only  the  single  successful  subject  really  held  to  his  initial  plan 
of  performing  a  high  level  overview;  for  the  type  of  bug  in  this  study, 
this  seemed  to  be  the  right  strategy.  The  signs  of  where  the  problem 
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might  lie  were  most  apparent  when  taking  a  global  view  of  the  program, 
because  one  manifestation  of  the  error  was  an  inconsistent  usage  of  a 
subroutine  call.  The  successful  subject  was  comparing  all  the  places 
where  a  priority  queue  insertion  operation  was  performed  when  he  noticed 
this  inconsistency . 

Much  of  a  well  written  program  is  built  out  of  common  structures 
known  to  all  programmers.  Following  the  terminology  developed  by  the 
Programmer's  Apprentice  Project  at  MIT  (see  [Rich-81]  and  [Waters- 
78]),  we  call  such  commonly  used  components  "cliches".  The  bug  was 
caused  by  the  incorrect  implementation  of  a  list  insertion  cliche. 
Thu 6,  in  this  case,  the  understanding  of  such  a  cliche  was  important. 
The  successful  subject  was  in  fact  explicitly  aware  of  this 
cliche  and  of  its  limitations.  In  general,  cliches  are  important 
because  they  speed  understanding  by  chunking  the  program  into  well 
understood  higher  level  units.  An  understanding  of  the  specific  limi¬ 
tations  and  likely  failure  modes  of  cliches  is  also  a  powerful  asset  in 
debugging  programs. 


6.2  VOCABULARY  FOR  REFERRING  TO  PROGRAMS 

The  last  question  on  the  questionnaire,  which  dealt  with  vocabu¬ 
lary,  was  often  partially  ignored;  this  seems  due  to  the  unstructured 
nature  of  the  question.  While  most  subjects  commented  on  the  vocabulary 
examples  given,  they  rarely  added  new  examples  of  their  own.  New  voca- 
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bulary  then  was  most  often  culled  from  the  experimenter's  notes  of  what 
the  subject  said.  Analysis  of  the  language  used  by  the  subjects  in 
their  introspective  reports  yields  five  distinct  categories  of  vocabu¬ 
lary.  An  explanation  of  each  of  these  classes,  with  examples,  follows. 


I 


0 

n 

n 

p 

n 


1.  Computer  language  specific  terminology 

For  this  study  the  program  was  written  in  Lisp  and  a  large 
number  of  Lisp  specific  terms  were  common  in  the  subjects'  speech. 
Names  of  particular  functions  and  language  keywords  are  bound  to 
show  up,  if  only  to  designate  locations  in  the  listing,  as  in  the 
phrase  "The  second  argument  of  cons".  Function  names  can  also  be 
used  to  designate  the  concepts  they  represent  in  the  language. 
Again,  the  Lisp  function  cons  creates  a  new  data  structure  in 
memory,  so  a  phrase  such  as  "the  cons  of  a  and  b"  refers  to  a 
language  specific  entity.  Other  language  specific  concepts  or  ter¬ 
minology  for  more  general  concepts  were  also  evident.  Subjects 
frequently  spoke  of  the  binding  of  variables,  a  Lisp  term  for  the 
value  of  a  variable  in  a  certain  context. 

2.  General  programming  cliches 

The  program,  despite  its  disorganization,  lapses  of  style,  and 
lack  of  documentation,  was  largely  constructed  out  of  commonly  used 
programming  abstractions  (cliches).  It  made  heavy  use  of  a  hash 
table  abstraction  and  of  a  priority  queue.  These  were  fairly  obvi¬ 
ous  in  the  code  and  were  noticed  eventually  by  all  the  subjects. 
Each  such  cliche  comes  with  some  vocabulary  commonly  used  to  refer 
to  its  parts  and  the  operations  defined  for  it.  Simple  examples 
include  insertion  and  deletion.  The  subjects  did  in  fact  use  this 
terminology  when  talking  about  parts  of  these  cliches. 

3.  Domain  specific  terminology 

The  domain  of  the  study  program  was  a  colony  of  cells,  some  of 
them  cancerous,  growing  in  some  environment.  While  not  experts  in 
cell  biology,  all  the  subjects  developed  some  reasonable  expecta¬ 
tion  of  the  program's  behavior  based  on  their  understanding  of  the 
domain.  As  with  the  language  specific  vocabulary,  many  of  these 
terms  appear  as  function  names,  in  this  case  defined  and  later  used 
in  the  program,  and  similarly  as  variable  names.  Subjects  fre¬ 
quently  found  occasion  to  talk  about  cells  and  metastasis .  Poten¬ 
tially  an  even  richer  source  of  such  terminology  is  the  documenta- 
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tion  that  should  accompany  a  program. 

4.  Natural  Language  Constructs 

In  natural  language,  objects  may  be  referenced  in  a  number  of 
indirect  ways.  Anaphora  designates  references  to  objects  previ¬ 
ously  mentioned  in  the  discourse.  Definite  noun  phrases  or  pro¬ 
nouns  may  serve  this  purpose.  For  example,  subjects  frequently 
designated  an  argument  to  a  function  as  "...its  first  argument." 
The  "it"  refers  to  the  function. 

Deixis  designates  references  to  objects  present  in  the 
environment,  either  by  pointing  or  description.  Such  references 
are  more  common  when  trying  to  make  clear  to  someone  else  which 
object  you  are  referring  to.  Deixis  was  accordingly  less  common  in 
this  study,  as  the  subjects  felt  they  were  primarily  talking  to 
themselves. 

5.  Idiosyncratic,  user-specific  views  of  the  world 

It  was  apparent  that  the  subjects  developed  different  models 
of  the  program  varying  along  idiosyncratic  lines.  For  example,  one 
subject  viewed  the  several  different  internal  representations  of 
the  cell  colony  as  successive  projections  (in  a  mathematical  sense) 
of  the  basic  representation,  which  he  took  to  be  the  events  queue. 
He  used  this  terminology  to  talk  about  the  data  structures  and  the 
algorithms  that  mapped  between  them. 


6.3  ANALYSIS  OF  PROGRAM  LISTING  ANNOTATIONS 


The  listings  given  to  the  subjects  for  study  were  collected  at  the 
end  of  each  trial  and  analyzed  in  order  to  determine  what  sorts  of 
interface  facilities  should  be  made  available  to  a  user  in  an  on-line 
tool.  Both  graphic  and  textual  annotations  were  common. 


The  major  classes  of  graphic  annotations  were  highlighting,  group¬ 
ing,  and  connecting.  Highlighting  was  used  to  focus  attention  on  a  part 
of  the  listing,  or  to  make  it  easier  to  find  again  in  the  future. 
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Subjects  made  use  of  multiple  colors,  and  underlined,  boxed  or  shaded 
Che  desired  object.  Grouping  generally  consisted  of  drawing  a  box  or 
brackets  around  some  items  in  the  listing  to  identify  them  as  a  cohesive 
unit.  Frequently,  text  was  attached  to  explain  the  significance  of  the 
grouping.  Connecting  was  usually  done  with  an  arrow,  either  between 
objects  in  the  listing,  or  between  an  object  and  a  textual  comment  added 
by  the  subject  to  describe  the  object. 

Text  was  used  to  record  any  discoveries  the  subject  deemed  worth 
remembering.  This  included  both  labels  and  longer  comments  or  explana¬ 
tions.  It  was  scribbled  wherever  there  was  space,  and  connected  to  some 
designated  object  in  the  listing. 


6.4  DESIRED  DO CD MENTATION 

The  general  attitude  of  the  subjects  towards  documentation  can  best 
be  summed  up  by  a  comment  one  of  them  made  on  the  questionnaire:  "I  know 
of  only  one  type  of  documentation  that  is  not  especially  helpful:  wrong 
documentation."  The  only  real  concern  any  subject  expressed  was  that 
the  programmer  might  be  overloaded  with  irrelevant  information.  Many 
types  of  documentation  were  suggested  by  the  subjects,  the  most  novel 
perhaps  being  detailed  history  information,  including  answers  to  such 
questions  as:  who  wrote  it,  when  was  it  written,  how  was  it  tested,  did 
it  ever  work  before,  were  existing  subroutine  libraries  used,  etc.  A 
good  knowledge  of  this  type  of  history  can  strongly  influence  what  type 
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of  bugs  are  suspected. 


6.5  DESIGN  OF  THE  STUDY 

In  order  to  allow  fine  tuning  of  the  study,  a  question  about  the 
format  of  the  study  itself  was  included  on  the  questionnaire.  After 
each  trial,  the  recommendations  were  considered,  and  slight  alterations 
were  sometimes  made  to  the  study.  Our  major  concern  was  that  the  sub¬ 
jects  find  the  experience  as  "natural"  as  possible. 

The  principle  findings  here  were  that  the  subjects  had  no  problem 
understanding  the  task  or  performing  it  with  an  observer  present.  The 
need  to  talk  out  loud  while  studying  the  program  was  not  viewed  as  a 
significant  inconvenience,  and  in  general,  subjects  felt  they  performed 
as  they  would  have  given  a  comparable  real  world  task. 

The  major  caveat  to  this  appraisal  was  that  normally  the  subjects 
would  expect  to  have  better  tools.  In  particular,  real  debugging  would 
not  get  very  far  without  a  run  time  environment.  A  large  effort  was 
required  by  the  subjects  to  perform  hand  simulations  of  the  code,  and 
the  errors  they  made  in  the  process  complicated  the  effort  of  finding 
the  mistake  in  the  program  (as  well  as  straining  their  ability  to  con¬ 
centrate  on  the  task  at  hand).  Even  without  the  facilities  to  run  pro¬ 
grams,  subjects  would  have  greatly  appreciated  a  standard  text  editor 
with  its  basic  string  search  capabilities. 
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Subjects  also  felt  that  the  questionnaire  was  not  sharply  enough 
focused,  a  problem  which  we  felt  derived  from  the  exploratory  nature  of 
the  study.  The  length  of  the  questionnaire  had  its  repercussions,  for 
instance,  few  of  the  subjects  gave  interesting  responses  to  the  final 
question  on  vocabulary. 
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7.  IMPLICATIONS  FOR  THE  PRL/EPM 

This  study  has  several  implications  for  the  design  of  both  the  PRL 
and  the  EPM.  Some  of  these  are  confirmations  of  assumptions  and  biases 
we  have  been  working  with  since  the  start  of  the  project;  others  are 
genuinely  new  issues  raised  by  the  performance  of  the  subjects  in  this 
test. 


In  the  confirmation  category  there  were  two  major  observations: 


1.  The  multiple  views  of  the  EPM  are  useful. 

The  subjects  really  did  look  for  information  at  all  the  levels 
from  simple  text  string  searches  up  to  searches  through  all 
instances  of  some  cliche  action.  Examples  of  PRL  operations  they 
performed  by  hand  include:  "Visit  in  sequence  all  the  functions 
called  from  this  function,"  "Highlight  all  the  exits  from  this 
loop,"  and  "Visit  all  the  places  Event-Queue  has  its  value 
changed. " 

2.  Documentation  is  critically  important. 

The  lack  of  documentation  in  this  study  highlighted  the  impor¬ 
tance  of  this  information  source.  Even  the  small  packet  provided 
was  a  major  improvement  over  the  pilot  trial  where  there  was  no 
documentation  at  all.  Some  of  the  subjects'  specific  requests  for 
information,  such  as  about  the  history  of  the  program,  could  rea¬ 
sonably  be  kept  available  as  documentation,  easing  the  task  of 
debugging  considerably.  Making  all  the  information  available  on¬ 
line  would  clearly  be  a  major  advance.  These  are  issues  we  are 
considering  both  in  the  IPE  project  and  in  a  separate  project 
called  the  "Documentation  Assistant". 
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Though  not  particular  surprising,  these  results  are  relevant  to  our 
efforts,  and  tend  to  support  the  assumptions  on  which  we  have  based  much 
of  work.  There  were  four  areas  where  this  study  yielded  new  results: 


1.  A  sampling  of  vocabulary  for  the  PRL  was  gathered. 

Of  the  new  results  from  this  study,  the  most  important,  and 
the  one  that  most  directly  motivated  the  study  in  the  first  place, 
was  the  sampling  of  programmer  vocabulary.  The  natural  vocabulary 
we  gathered  turned  out  to  be  drawn  from  the  five  distinct 
categories  presented  earlier:  computer  language  specific  terminol- 
ogy,  computer  programming  cliches,  domain  specific  terminology, 
natural  forms  of  reference,  and  user  specific  views  of  the  world. 

Of  these  five  categories  of  vocabulary,  the  EPM  directly  pro¬ 
vides  representations  for  the  first  three:  the  syntax  representa¬ 
tion  of  the  EPM  provides  computer  language  specific  terminology; 
the  typical  programming  pattern  representation  provides  terminology 
for  programming  cliches;  and  intentional  aggregates  provide  domain 
specific  terminology.  The  remaining  two  categories  are  not 
directly  addressed  in  the  current  EPM  design.  To  allow  the  user 
full  freedom  of  expression  would  require  the  PRL  to  deal  with  all 
the  intricacies  of  natural  language  processing  (a  currently 
unsolved  problem);  moreover,  when  using  a  keyboard  to  enter  queries 
and  commands,  it  is  not  clear  that  a  user  wants  to  type  out  full 
sentences  (or  even  sentence  fragments).  The  PRL  would  require  a 
very  specialized  user  model  to  allow  users  to  talk  about  the  pro¬ 
gram  in  their  own  highly  stylized  way. 

2.  Individual  differences  imply  need  for  customization. 

The  study  showed  some  of  the  ways  that  individual  programmers 
vary  in  work  style.  To  support  programmers  effectively,  it  appears 
necessary  to  provide  for  customization  of  the  work  environment.  An 
intelligent  programming  environment  might  maintain  a  user  profile 
either  based  on  explicit  user  requests,  or  in  an  advanced  system, 
based  on  autonomous  observations.  While  some  existing  editors 
allow  a  small  amount  of  individual  user  control  over  the  behavior 
of  some  features,  none  have  extensive  user  models. 

3.  Context-sensitive  information  management  is  important. 

There  are  three  major  open  questions  on  this  issue.  When 
should  information  be  available  but  hidden?  Subjects  indicated  a 
desire  for  many  types  of  documentation,  but  they  did  not  want  to 
see  all  of  them  all  of  the  time.  When  should  information  be 
ellided?  In  order  to  fit  on  the  screen,  code  and  documentation  may 
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have  to  be  condensed.  When  should  information  be  forgotten?  While 
much  of  what  subjects  wrote  on  their  listings  was  intended  to  be 
permanent,  sometimes  they  made  assumptions  or  drew  conclusions 
which  they  later  wanted  to  change.  They  also  frequently  made  nota¬ 
tions  that  were  only  intended  to  be  temporary  reminders  of  some 
postponed  task. 

4.  Useful  user  interface  features  were  identified. 

The  study  pointed  out  the  need  for  sophistication  in  the  user 
interface.  Information  must  be  managed  not  only  internally,  but 
also  in  its  presentation  on  the  screen.  The  user  should  be  allowed 
to  work  in  the  familiar  paper  and  pencil  mode  if  and  when  appropri¬ 
ate,  and  should  be  able  to  call  up  all  relevant  documentation  on¬ 
line,  but  cannot  afford  to  be  overwhelmed  by  cluttering  the  screen 
with  everything  the  system  has  stored  about  some  piece  of  a  pro¬ 
gram. 
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8.  SUMMARY 


The  key  results  of  this  study  were  (l)a  confirmation  of  the  use' 
fulness  of  the  conceptual  mechanisms  provided  by  the  Extended  Program 
Model,  and  (2)  identification  of  new  areas  and  issues  important  for  the 
development  of  the  PRL  and  EPM. 

The  new  issues  outline  a  program  of  further  work  to  pursue  in  the 
continuation  of  the  PRL  project.  The  vocabulary  lists  have  already 
spurred  the  development  of  a  tentative  formal  syntax  for  queries  in  PRL. 
Further  analysis  will  lead  to  refinement  of  this  specification,  and 
eventually  to  an  extended  and  modified  version  which  will  define  the 
user  language.  Observations  about  how  practicing  programmers  go  about 
understanding  programs  will  provide  significant  guidance  on  future  PRL 
work.  Other  new  issues  will  influence  design  work  on  the  IPE,  which 
continues  under  a  separate  contract. 
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9.  PLANS  FOR  FURTHER  DEVELOPMENT 


Further  work  on  the  PRL  will  concentrate  on  specifying  the  formal 
syntax,  complete  basic  vocabulary,  and  external  user  syntax  for  the 
language.  The  vocabulary  data  from  this  study  provide  a  good  starting 
point  for  such  an  effort.  We  plan  to  define  a  relatively  simple, 
strongly  constrained  syntax  for  the  system's  internal  use,  while  provid¬ 
ing  a  looser,  more  forgiving  syntax  for  the  user.  Given  these  two  lev¬ 
els  of  the  language,  we  must  design  a  method  for  mapping  between  them. 

There  are  several  issues  highlighted  by  this  study  that  we  will  not 
pursue.  The  first  of  these  is  the  need  for  strong  support  in  the  debug¬ 
ging  task,  ideally  in  the  form  of  a  dynamic  debugging  environment.  We 
also  do  not  plan  to  tackle  head-on  the  problem  of  processing  uncon¬ 
strained  natural  language  which  is  a  large  area  of  research  that  is  not 
directly  related  to  the  PRL.  Finally,  we  do  not  intend  to  model  each 
individual  user  so  as  to  understand  their  personal  idiosyncratic  vocabu¬ 
laries. 

We  remain  uncommitted  as  to  how  much  of  the  high  level  modeling  of 
users  and  domainB  we  will  be  able  to  handle.  These  have  the  potential 
for  significant  payoff,  and  are  the  most  likely  areas  for  introducing 
additional  intelligence  into  the  system,  beyond  its  basic  understanding 
of  the  programming  domain  itself.  Such  modeling  is,  like  natural 
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langue^s,  a  major  research  effort  in  its  own  right,  but  one  which  is 
more  directly  germane  to  the  goal  of  developing  intelligent  aids  to 
software  comprehension. 
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10.  PERSONNEL 


10.1  PERSONNEL 


The  Program  Reference  Language  (PRL)  research  project  is 
formed  within  the  User  Aids  Program  of  AI&DS,  with  Dr.  Brian 
Program  Manager,  as  Principal  Investigator.  Other  members  of 
technical  staff  who  are  contributors  to  the  project  include 
Dean,  Eric  A.  Domeshek,  Michael  A.  Brzustowicz,  and  Daniel  G. 


being  per- 
P.  McCune, 
the  AI&DS 
Jeffrey  S. 
Shapiro. 


Dr.  Brian  P.  McCune  is  the  Principal  Investigator  of  the  PRL  pro¬ 
ject.  He  received  his  Ph.D.  in  Computer  Science  from  Stanford  Univer¬ 
sity  in  1979;  the  title  of  his  thesis  was  "Building  Program  Models 
Incrementally  from  Informal  Descriptions. "  During  the  past  decade,  Dr. 
McCune  has  done  research  in  the  areas  of  artificial  intelligence, 
software  systems,  and  computer  architecture,  with  emphasis  on  artificial 
intelligence  approaches  to  software  development  and  maintenance,  infor¬ 
mation  retrieval,  database  management,  hypothesis  formation,  planning, 
and  distributed  processing.  He  has  been  the  principal  investigator  of 
research  projects  to  select  and  design  candidate  AI  tools  for  assisting 
in  the  maintenance  of  ADA  programs  (sponsored  by  Rome  Air  Development 
Center),  to  design  an  intelligent  program  editor  for  ADA,  to  determine 
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the  feasibility  of  automatically  generating  operating  systems,  and  to 
design  and  implement  a  knowledge-based  system  for  textual  information 
retrieval.  Dr.  McCune  is  an  Associate  Editor  of  The  AI  Magazine .  He 
has  been  invited  to  discuss  the  application  of  artificial  intelligence 
to  defense  problems  numerous  times,  both  at  workshops  and  in  published 
papers . 

Jeffrey  S.  Dean  has  recently  begun  to  play  a  key  role  in  the  PRL 
project;  he  is  currently  leading  the  related  Intelligent  Program  Editor 
project,  and  was  previously  the  leader  of  the  AI&DS  Software  Maintenance 
Project,  which  defined  advanced  Ada  trols  for  software  maintenance.  He 
received  his  Masters  degree  in  Computer  Science/Computer  Engineering 
from  Stanford  University,  where  he  worked  on  the  automatic  derivation  of 
operating  systems.  His  main  research  interest  is  the  application  of  AI 
to  software  tools.  He  came  to  AI&DS  in  January  1981  from  Bell  Telephone 
Laboratories,  where  he  was  involved  in  the  development  and  maintenance 
of  the  UNIX  operating  system  and  its  utilities. 

Daniel  G.  Shapiro  has  been  contributing  to  the  PRL  project  since 
joining  AI&DS  in  October  1981,  after  receiving  a  Masters  degree  in 
electrical  engineering  and  computer  science  from  the  Massachusetts 
Institute  of  Technology.  Hi6  research  interests  include  artificial 
intelligence,  expert  systems,  and  software  engineering.  At  AI&DS  he  has 
done  work  on  expert  systems  for  program  and  documentation  editing, 
information  retrieval,  and  mission  planning.  His  masters  thesis,  enti- 
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tied  "Sniffer:  A  system  that  Understands  Bugs,"  involved  the  design  and 
implementation  of  a  semantics-based  debugger  for  the  Programmer's 
Apprentice  project  at  the  MIT  Artificial  Intelligence  Laboratory.  He 
also  taught  software  engineering  courses  at  MIT. 

Eric  A.  Domeshek  was  responsible  for  much  of  the  PRL  experiment 
which  studied  how  people  think  about  programs.  Mr.  Domeshek  received  an 
A.B.  in  Physics  from  Harvard  College.  His  course  work  also  emphasised 
computer  science  and  cognitive  science.  His  technical  interests  are  in 
Artificial  Intelligence,  particularly  knowledge  representation,  and  in 
computer  graphics. 

Michael  A.  Brzustowicz  has  been  involved  with  the  PRL  project  since 
joining  AI&DS  in  November  1983.  He  received  an  S.B.  degree  in  Physics 
from  the  Massachusetts  Institute  of  Technology  in  1979  and  received  his 
M.S.E.E.  in  Computer  Engineering  from  Carnegie-Mellon  University  in 
1980;  his  thesis  work  was  entitled  "A  System  for  the  Implementaiton  of 
Models  of  Reasoning  with  Uncertain  Data."  Mr.  Brzustowicz's  current 
areas  of  interest  include  Artificial  Intelligence,  Software  Engineering, 
Ergonomic  User  Interfaces,  and  Computer  Aided  Processes.  Prior  to  join¬ 
ing  AI&DS,  Mr.  Brzustowicz  worked  for  the  Development  Systems  Software 
Group  of  the  Semiconductor  Division  of  Texas  Instruments,  and  for  the 
Unix  Development  Group  at  Bell  Laboratories. 
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10.2  INTERACTIONS 

Dr.  Brian  P.  McCune  is  an  Associate  Editor  of  The  AI  Magazine,  the 
publication  of  the  American  Association  for  Artificial  Intelligence.  He 
is  on  the  Editorial  Advisory  Board  of  Defense  Electronics  and  also  The 
Artif icial  Intelligence  Report . 

Dr.  McCune  was  an  invited  speaker  to  COMPSAC  '83  (November  1983) 
and  EASCON  '83  (September  1983),  and  was  an  invited  participant  to 
Knowledge  Based  Software  Assistant  Workshop  at  AAAI-83  (August  1983). 
He  attended  the  NAVAIR/ONR  Aviation  Software  Workshop  (October  1983), 
the  DARPA  Formalized  Software  Development  Workshop  (November  1983),  the 
Conference  on  Inference  Theory  and  AI  (November  1982),  and  the  Software 
Maintenance  Workshop  (December  1983). 

In  addition  to  lectures  associated  with  papers  that  appeared  in 
published  conference  proceedings,  project  staff  members  have  given 
numerous  lectures  around  the  country.  Dr.  McCune  has  been  lecturing 
throughout  the  federal  government  on  software  maintenance  and  intelli¬ 
gence  problems  and  the  potential  of  artificial  intelligence  to  help 
solve  them.  Along  with  Daniel  G.  Shapiro,  he  presented  results  from 
the  PRL  project  to  Dr.  Northrup  Fowler  III  and  Douglas  White  of  RADC  at 
AI&DS  in  December  1982. 
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Dr.  McCune  was  one  of  twelve  technologists  selected  to  participated 
in  the  Government-sponsored  Conference  on  Inference  Theory  and  Artifi¬ 
cial  Intelligence,  held  in  Leesburg,  Virginia,  in  November  1982  to  dis¬ 
cuss  how  artificial  intelligence,  decision  analysis,  and  inference 
theory  might  be  combined  to  enhance  the  production  of  intelligence.  Dr. 
McCune  attended  the  DoD  Software  Initiative  Workshop  in  Raleigh,  North 
Carolina,  in  February  1983. 

Dr.  McCune  attended  the  Eighth  International  Joint  Conference  on 
Artificial  Intelligence  (IJCAI-83),  held  in  Karlsruhe,  Germany,  in 
August  1983;  the  National  Conference  on  Artificial  Intelligence  (AAAI- 
83),  Washington  D.C. ,  August  1983;  and  the  Symposium  on  Intelligence 
Applications  of  Advanced  Computer  and  Information  Technology:  Focus  on 
Artificial  Intelligence,  sponsored  by  the  Offices  of  Research  and 
Development  and  Scientific  and  Weapons  Research,  Central  Intelligence 
Agency,  and  held  in  Washington,  D.C. ,  November  1982. 

Dr.  McCune  has  been  interfacing  heavily  with  both  operational  and 
developmental  commands  in  the  Air  Force  and  elsewhere  in  DoD  and  indus¬ 
try  in  order  to  understand  current  and  future  problems  of  software 
development  and  maintenance.  Within  the  Air  Force,  Dr.  McCune  has  met 
with  personnel  at  the  Air  Force  Office  of  Scientific  Research,  Rome  Air 
Development  Center,  Wright  Aeronautical  Laboratories,  Foreign  Technology 
Division,  Strategic  Air  Command  headquarters,  Air  Force  Communications 
Computer  Programming  Center,  and  Air  Force  Satellite  Control  Facility. 
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Elsewhere  in  DoD  he  has  talked  with  the  Defense  Intelligence  Agency, 
Office  of  the  Undersecretary  of  Defense  for  Research  and  Engineering, 
Defense  Advanced  Research  Projects  Agency,  DoD  STARS  Program,  ADA  Joint 
Program  Office,  Office  of  Naval  Research,  Naval  Electronics  Systems  Com¬ 
mand,  Naval  Sea  Systems  Command,  Naval  Intelligence  Command,  Naval 
Research  Laboratory,  Naval  Ocean  Systems  Center,  Naval  Intelligence 
Center,  Naval  Weapons  Center,  Army  Research  Office,  Army  Center  for  Tac¬ 
tical  Computer  Systems,  and  Army  Ballistic  Missile  Defense  Advanced 
Technology  Center. 

Dr.  McCune  has  also  visited  numerous  universities  and  research 
centers  to  assess  the  state  of  the  art  in  automatic  programming  at  first 
hand.  Places  visited  include  Harvard  University,  Massachusetts  Insti¬ 
tute  of  Technology,  Carnegie-Mel Ion  University,  Duke  University,  Univer¬ 
sity  of  California  at  Irvine,  and  Stanford  University. 

Jeffrey  S.  Dean  presented  a  paper  on  a  study  of  software  mainte¬ 
nance  at  the  Software  Maintenance  Workshop  (December  1983).  He  attended 
the  Symposium  for  Application  and  Assessment  of  Automated  Tools  for 
Software  Development  (November  1983);  AAAI-83;  IJCAI-81;  the  Working 
Conference  on  Automated  Tools  for  Information  Systems  Design  and 
Development,  held  in  New  Orleans  in  January  1982  and  sponsored  by  IFIP 
Working  Group  8.1  on  Desigr  and  Evaluation  of  Information  Systems;  and 
UNICOM,  the  semiannual  UNIX  users'  conference  (January  1983). 
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Daniel  G.  Shapiro  was  a  panelist  at  the  ACM  S IG SOFT/ SIG PLAN 
Software  Engineering  Symposium  on  High-Level  Debugging,  held  in  Pacific 
Grove,  California,  in  March  1983.  He  presented  papers  on  the  PRL  at  the 
IEEE  Trends  and  Applications  Conference  (May  1983)  and  the  Seventh 
International  Conference  on  Software  Engineering  (March  1984).  He 
presented  papers  on  information  retrieval  at  AAAI-83  and  IJCAI-83. 

Eric  A.  Domeshek  attended  the  Symposium  for  Application  and  Assess¬ 
ment  of  Automated  Tools  for  Software  Development  (November  1983)  and 
AAAI-83. 

Michael  A.  Brzustowicz  attended  the  Symposium  for  Application  and 
Assessment  of  Automated  Tools  for  Software  Development  (November  1983). 


10.3  PUBLICATIONS 


Members  of  PRL  project  staff  have  published  a  number  of  papers; 
reprints  of  those  papers  most  relevant  to  the  PRL  project  have  been 
included  as  Appendices  to  this  proposal.  A  cumulative  chronological 
list  of  publications  appearing  in  technical  journals  and  conference 
proceedings  is  listed  below: 

Daniel  G.  Shapiro,  Jeffrey  S.  Dean,  and  Brian  P.  McCune,  "A  Knowledge 
Base  for  Supporting  an  Intelligent  Program  Editor,"  7th  International 
Conference  on  Software  Engineering,  March  1984. 

Andrew  S.  Cromarty,  Daniel  G.  Shapiro  and  Michael  R.  Fehling,  "Still 
Planners  Run  Deep:  Shallow  Reasoning  for  Fast  Replanning,"  Proceedings, 
Society  of  Photo-Optical  Instrumentation  Engineers,  Technical  Symposium 
East,  1984,  to  appear. 
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Jeffrey  S.  Dean  and  Brian- P.  McCune,  "An  Informal  Study  of  Software 
Maintenance  Problems,"  Proceedings,  Software  Maintenance  Workshop, 
December  1983. 

Brian  P.  McCune  and  Jeffrey  S.  Dean,  "Trends  for  Advanced  Software 
Tools,"  Defense  Science  2001+  (reprint  of  EASCON  '83  paper), 

December  1983. 

Brian  P.  McCune,  Richard  M.  Tong,  Jeffrey  S.  Dean,  and  Daniel  G. 

Shapiro,  "RUBRIC:  A  System  for  Rule-Based  Information  Retrieval," 
Proceedings,  COMPSAC  1983,  November  1983. 

Brian  P.  McCune  and  Jeffrey  S.  Dean,  "Trends  for  Advanced  Software 
Tools,"  invited  paper,  Proceedings,  EASCON  '83,  September  1983. 

Richard  M.  Tong,  Daniel  G.  Shapiro,  Brian  P.  McCune,  and  Jeffrey  S. 

Dean,  "A  Rule-Based  Approach  to  Information  Retrieval:  Some  Results  and 
Comments,"  Proceedings,  National  Conference  on  Artificial  Intelligence, 
Washington,  D.C.,  August  1983. 

Richard  M.  Tong,  Daniel  G.  Shapiro,  Jeffrey  S.  Dean,  and  Brian  P. 

McCune,  "Performance  Analysis  of  a  Rule-Based  Information  Retrieval 
System,"  1983  National  Conference  on  Artificial  Intelligence, 

Washington,  D.C.,  August  1983. 

Richard  M,  Tong,  Daniel  G.  Shapiro,  Jeffrey  S.  Dean,  and  Brian  P. 

McCune,  "A  Comparison  of  Uncertainty  Calculi  in  an  Expert  System  for 
Information  Retrieval,"  Eighth  International  Joint  Conference  on 
Artificial  Intelligence,  Karlsruhe,  West  Germany,  August  1983. 
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(declare 

(special  events-queue  div-time  grid  transform-lib  directions 
north  south  east  west  prosper-display-buffer)) 

(defun  outer-prosper  (^optional  (evq  (make-an-evq ) ) ) 
(create-prosper-display-buf fer ) 

(prosper  evq)) 

(defun  prosper  (events-queue) 

((lambda  (transform-lib  grid) 

(prog  (matches  cell  div-time) 

(grid-init  events-queue  grid) 


foo 

(cond  ((null  events-queue)  (return  nil))) 

(display-grid  grid) 

(setq  cell  (top-cell  events-queue)) 

(setq  div-time  (top-time  events-queue)) 

(setq  events-queue  (rest  events-queue)) 

(setq  matches  (find-transforms  cell  transform-lib)) 
(apply-transform6  matches  cell  grid) 

(go  foo))) 

(create-transform-lib) 

(create-grid) ) ) 

;the  def  for  metastasize  can  be  snarfed  off  of  the  plans  ;as  well  as  the 
def  for  events-<iueue-insert 

(defun  grid-init  (evq  grid) 

(do  ((q  evq  (rest  q)) 

(tope  nil ) ) 

((null  q)  nil) 

(setq  tope  (top-cell  q)) 

(ht-insert  (cell-location  tope)  tope  grid))) 

(defun  create-grid  () 

(let  ( (g  (array  g  t  21))) 
g)) 

(defun  top-time  (evq)  (cond  ((null  evq)  nil)  ((car  (car  evq)))))  (defun 
top-cell  (evq)  (cond  ((null  evq)  nil)  ((edr  (car  evq)))))  (defun  rest 
(evq)  (cond  ((null  evq)  nil)  ((edr  evq)))) 

(defun  find-transforms  (key-cell  tlib) 

(ht-lookup-all  (cell-type  key-cell)  tlib)) 

(defun  apply-transforms  (candidates  key-cell  grid) 

;run  the  filter  function  associated  with  each  candidate. 

; if  it  succeeds,  apply  the  tranaform  to  the  bindings  returned  by  the 


'.filter  function  only  run  the  first  matching  transform 
(do  ((trs  candidates  (cdr  trs)) 

(transform  nil ) 

(bindings  nil)) 

((null  trs)  nil) 

(setq  transform  (car  trs)) 

(setq  bindings  (apply-selector  (car  transform)  key-cell  grid)) 
;bindings  is  a  list  of  cells  which  form  the  context  for  the  key 

cel  1 

;ALL  selector  functions  have  the  key-cell  as  the  last  element  of 

the 

;bindings  returned 

(cond  (bindings  (return  (apply  (car  (cdr  transform))  bindings)))))) 

(defun  apply-selector  (selector-fun  key-cell  grid) 

;takes  care  of  mapping  the  selector  function  all  possible  ways 
;onto  the  grid 

(do  ((nominal-north  (+  1  (random  4))  (  +  1  nominal-north ) ) 

(tries  1  (+  1  tries)) 

(win  nil)) 

((>  tries  4)  nil) 

(let  ((north  (nth  nominal-north  directions)) 

(west  (nth  (+  nominal-north  1)  directions)) 

(south  (nth  (+  nominal-north  2)  directions)) 

(east  (nth  (+  nominal-north  3)  directions))) 

(and  (setq  win  (funcall  selector-fun  key-cell  grid)) 

(return  win))))) 

(defun  north  (loc)  (loc+  north  loc))  (defun  south  (loc)  (loc+  south 
loc))  (defun  east  (loc)  (loc+  east  loc))  (defun  west  (loc)  (loc+  west 
loc) ) 

(defun  north-cell  (c  g)  (ht-lookup  (north  (cell-location  c))  g))  (defun 
south-cell  (c  g)  (ht-lookup  (south  (cell-location  c))  g))  (defun  east¬ 
cell  (c  g)  (ht-lookup  (east  (cell-location  c))  g))  (defun  west-cell  (c 
g)  (ht-lookup  (west  (cell-location  c))  g)) 

(defun  display-grid  (grid) 

(clear-proeper-dieplay-buf fer ) 

(map-over-al l-ht-datums  grid  'place-cell-in-buffer) 
(print-prosper-display-buf fer ) ) 

(defun  events-queue-insert  (item  time  evq ) 

(prog  (nq  oq  entry) 

(setq  entry  (cons  time  item)) 

(cond  ((or  (null  evq)  (before?  entry  (car  evq))) 

(return  (cons  entry  evq)))) 

(setq  nq  (cdr  evq)) 

(setq  oq  evq) 


I] 

H 


Ip  (cond  ((or  (null  nq )  (before?  entry  (car  nq ) ) ) 

(rplacd  oq  (cons  entry  nq ) ) 

(return  evq))) 

(setq  oq  nq  ) 

(setq  nq  (cdr  nq)) 

(go  lp))) 

(defun  before?  (iteral  item2) 

(<  (car  iteml)  (car  item2))) 

(defun  make-cell  (type  loc  dive)  ( copy-the-damn-thing  (list  type  loc 
dive)))  (defun  cell-type  (cell)  (car  cell))  (defun  division-count  (cell) 
(car  (cdr  (cdr  cell))))  (defun  cell-location  (cell)  (car  (cdr  cell))) 
(defun  change-cell-type  (cell  new-type)  (rplaca  cell  new-type))  (defun 
increment-division-count  (cell) 

(rplaca  (cdr  (cdr  cell))  (  +  1  (division-count  cell))))  (defun  create- 
cancer-cell  ()  (copy-the-damn-thing  '(c  ()  1))) 

(defun  make-location  (x  y)  (list  x  y))  (defun  loc-x  (loc)  (car  loc)) 
(defun  loc-y  (loc)  (car  (cdr  loc)))  (defun  loc+  (11  12)  (make-location 
(+  (loc-x  11)  (loc-x  12)) 

(+  (loc-y  11)  (loc-y  12))))  (defun 
loc-  (11  12)  (make-location  (-  (loc-x  11)  (loc-x  12)) 

(-  (loc-y  11)  (loc-y  12)))) 

hash  table  abstraction  ;;;;;;;;;;;;;;;  ;  this  hash  table 

indexes  K-enitities  (representing  monocells)  by  K-datums  ;  which 
represent  cell  locations 

(defun  ht-lookup  (key  ht) 

;find  the  appropriate  bucket,  then  search  it  for  a  tag  which 

;is  K-equal  key.  Return  the  cdr  of  the  alist  element  if  one  is  found. 

(let  ((bucket  (ht  (bucket-select  key  ht))) 

(item  nil)) 

(setq  item  (assoc  key  bucket)) 

(and  item  (cdr  item)))) 

(defun  ht-lookup-all  (key  ht) 

(let  ((bucket  (ht  (bucket-select  key  ht)))) 

(cond  ((null  bucket)  nil) 

( (mapear 

'(lambda  (x)  (cond  ((equal  (car  x)  key)  (cdr  x)))) 
bucket))))) 

(defun  ht-delete  (key  ht) 

;  Find  the  item  (a  monocell)  in  the  bucket  indexed  by  (k-qual  key). 

;  Splice  it  out  if  it  is  there. 

(let  ((bucket-num  (bucket-select  key  ht)) 

(bucket  nil)) 


(setq  bucket  (ht  bucket-num)) 

(cond  ((and  bucket  (k-equal  (caar  bucket)  key)) 

(store  (ht  bucket-num)  (cdr  bucket))) 

((do  ((old  bucket  new) 

(new  (cdr  bucket)  (cdr  new))) 

((null  new)  nil) 

(and  (k-equal  (caar  new)  key) 

(return  (rplacd  old  (cdr  new))))))) 

key ) ) 

(defun  ht-insert  (key  item  ht) 

(let  ((bucket-num  (bucket-select  key  ht)) 

(pair  (cons  key  item))) 

(store  (ht  bucket-num)  (cons  pair  (ht  bucket-num))) 
key) ) 

(defun  bucket-select  (key  ht) 

(remainder  (sxhash  key)  21)) 

(defun  sxhash  (key) 

(apply  '  +  (exploden  key))) 

(defun  ht-dump  (ht) 

(do  ( (i  0  (+  1  i))) 

((>  i  20)) 

(print  (ht  i)) 

(terpri) )) 

(defun  map-over-all-ht-da turns  (ht  fun) 

(do  ((i  0  (+  i  1))) 

((>  i  20)) 

(mapcar  '(lambda  (x)  (funcall  fun  (cdr  x)))  (ht  i)))) 

;(defun  ht-create  ()  ;  (let  ( (g  (array  nil  t  21)))  ;  g)) 

;STUB  ALERT  (defun  k-equal  (al  a2)  (equal  al  a2))  ;a  stub 

;;;;;;  the  other  prosper  functions  ;;;;;;; 

(defun  make-room-between  (cl  c2  g) 

(let  ((addend  (loc-  (cell-location  c2)  (cell-location  cl)))) 
(push-out  c2  addend  g))) 

(defun  push-out  (cell  addend  grid) 

(let  ((nev-loc  (loc+  (cell-location  cell)  addend))) 

(cond  ((null  (ht-lookup  new-loc  grid))  (grid-insert  cell  new-loc 
grid)) 

(t  (push-out  (ht-lookup  new-loc  grid)  addend  grid) 
(grid-insert  cell  new-loc  grid))))) 


(defun  grid-insert  (cell  location  grid) 

jremove  the  cell  from  its  old  location  in  the  grid, 

(ht-delete  (cell-location  cell)  grid) 

(ht-delete  location  grid)  ‘.remove  whatever  cell  currently  lurks  at 
location 

;side  effect  the  cell! 

(move-cell  cell  location  grid)) 

(defun  move-cell  (cell  loc  grid) 

;just  blithely  crams  the  new  fella  into  the  grid 
;side  effects  the  cell 

(rplaca  (cdr  cell)  loc)  ;  (k-eval  '(rplaca  (cdr  .cell)  loc)) 
(ht-insert  loc  cell  grid)) 

; (defun  make-location  (x  y)  ;  (k-cons  x  y)) 

;(defun  loc-y  (location)  (k-car  (k-cdr  location)))  ;(defun  loc-x 
(location)  (k-car  location)) 

(defun  make-an-evq  ( ) 

(copy-the-damn-thing 

'((0  c  (0  0)  1)  (3  a  (0  1)  1)  (3  a  (0  -1)  1)  (3  a  (1  0)  1)  (4  a  (-1 

0)  1)))) 

(defun  copy-the-damn-thing  (thing) 

(cond  ((dtpr  thing) 

(cons  (copy-the-damn-thing  (car  thing)) 

(copy-the-damn-thing  (cdr  thing)))) 

'  (thing))) 

(defun  create-prosper-display-buf fer  () 

(defprop  prosper-display-buf fer  15  width) 

(defprop  prosper-display-buf fer  15  height) 

(setq  prosper-display-buf fer  (array  prosper-display-buf fer  t 

(get  'prosper-display-buf fer 

'width)  (get  'prosper-display- 

buf  fer  'height)))) 

(defun  clear-prosper-display-buf fer  (&aux  width  height) 

(fillarray  prosper-display-buf fer  (list  'I  I))) 

(defun  place-cell-in-buffer  (cell  &aux  (cell-loc  (cell-location  cell)) 

width  height  x-pos  y-pos) 

(setq  width  (get  'prosper-display-buf fer  'width)) 

(setq  height  (get  'prosper-display-buf fer  'height)) 

(setq  x-pos  (+  (loc-x  cell-loc)  (fix  (/  width  2)))) 

(setq  y-pos  (+  (loc-y  cell-loc)  (fix  (/  height  2)))) 

(store  (prosper-display-buf fer  y-pos  x-pos)  (cell-type  cell))) 


(defun  pr int-prosper-display-buf fer  (&aux  width  height) 

(setq  width  (subl  (get  'pro sper-display-buf f er  'width))) 

(setq  height  (subl  (get  'prosper-display-buf fer  'height))) 

(do  i  0  (addl  i)  ( >&  i  height) 

(terpri) 

(do  j  0  (addl  j)  ( >&  j  width)  (princ  (prosper-display- 

buf  fer  i  j ))) ) 

(terpri)  ) 

;;;;;;;;;  transforms  and  selector  functions  ;;;;;;;;  (defun  a-cell- 
with-room-to-grow  (key-cell  grid) 

(let  ((north-neighbor  (north-cell  key-cell  grid))) 

(and  (eq  (cell-type  key-cell)  'a) 

(null  north-neighbor) 

(list  (north  (cell-location  key-cell))  key-cell)))) 

(defun  grow-A-cell  (empty-location  key-cell) 

( in crement-div is  ion-count  key-cell ) 

(prog  (new-cell) 

(setq  new-cell  (make-cell  'a  empty-location  1)) 

(grid-insert  new-cell(cell-location  new-cell)  grid) 

(setq  events-queue  (events-queue-insert  new-cell  (+ 

div-time  5)  events-queue)) 

(setq  events-queue  (events-queue-insert  key-cell  (+ 

div-time  5)  events-queue)))) 

(defun  gotta-b-neighbor  (key-cell  grid) 

(let  ((north-neighbor  (north-cell  key-cell  grid))) 

(and  (eq  (cell-type  north-neighbor)  'b)  (list  key-cell)))) 

(defun  age-prematurely  (key-cell) 

(increment-division-count  key-cell ) 

(increment-division-count  key-cell ) 

(increment-division-count  key-cel  1 ) 

(setq  events-queue  (events-queue-in6ert  key-cell  (+  div-time 

3)  events-queue))) 

(defun  surrounded-by-A-cells  (key-cell  grid 

&aux  (key-loc  (cell-location  key-cell))) 
;a  filter  function  for  the  carcinoma  transform 
;returns  a  list  of  cells  which  are  the  context  for  the  metast 
transform 

(let  ((tc  (north-cell  key-cell  grid)) 

(be  (south-cell  key-cell  grid)) 

(rc  (east-cell  key-cell  grid)) 

(lc  (west-cell  key-cell  grid))) 

(and 

(not  (eq  (cell-type  key-cell)  'c)) 
tc  (eq  (cell-type  tc)  'a) 
be  (eq  (cell-type  be)  'a) 


rc  (eq  (cell-type  rc)  'a) 
lc  (eq  (cell-type  lc)  'a) 

(list  key-cell)))) 

(defun  carcinoma  (key-cell) 

(increment-div ision-count  key-cel  1 ) 

(change-cell-type  key-cell  'c) 

(setq  events-queue  (events-queue-insert  key-cell  (+  div-time 

1)  events-queue))) 

(defun  enclosed-cancer-cell  (key-cell  grid) 

(let  ((tc  (north-cell  key-cell  grid)) 

(be  (south-cell  key-cell  grid)) 

(rc  (east-cell  key-cell  grid)) 

(lc  (west-cell  key-cell  grid))) 

(and 

(eq  (cell-type  key-cell)  'c) 

tc 

be 

rc 

lc 

(list  tc  key-cell)))) 

(defun  cancer-cel 1-with-one-neighbor  (key-cell  grid) 

(let  ((buddy  (west-cell  key-cell  grid))) 

(and  buddy  (list  buddy  key-cell)))) 

(defun  metastasize  (right-cell  key-cell) 

((lambda  (new-cell  location) 

( increment-div ision-count  key-cell ) 

(make -room-be tween  key -cel l  right-cell  grid) 

(grid-insert  new-cell  location  grid) 

(events-queue-insert  new-cell  (+  div-time  2)  events-queue) 

(event s-queue-insert  key-cell  (+  div-time  2)  events-queue)) 
(create-cancer-cell)  (cell-location  right-cell))) 

(defun  old-aged-cell  (key-cell  grid) 

(and  (>  (division-count  key-cell)  4)  (list  key-cell))) 

(defun  die  (key-cell) 

(ht-delete  (cell-location  key-cell)  grid)) 

(defun  cancer-cell-filter  (key-cell  grid) 

(cond  ((eq  (cell-type  key-cell)  'c)  (list  key-cell)))) 

(defun  cancer-cells-never-die  (key-cell) 

(setq  events-queue  (events-queue-insert  key-cell  (+  div-time 

5)  events-queue))) 


(defun  create-transform-lib  (Saux  tl) 


(setq  tl  (array  tl  t  21))  ;  (ht-insert  'b  (list  'surrounded-by-A- 
cells  'carcinoma)  tl) 

(ht-insert  'a  (list  'surrounded-by-A-cells  'carcinoma)  tl ) 

(ht-insert  'a  (list  'a-cell-vith-room-to-grow  'grow-A-cell )  tl) 

;  (ht-insert  'f  (list  'surrounded-by-A-cells  'carcinoma)  tl)  ;  (ht- 
insert  'c  (list  'gotta-b-neighbor  'age-pr ematurely )  tl)  ;  (ht-insert  'c 
(list  'cancer-cell-filter  'cancer-cel ls-never-die)  tl) 

(ht-insert  'c  (list  'cancer-cel 1-with-one-neighbor  'metastasize)  tl)  ; 
(ht-insert  'c  (list  'enclosed-cancer-cell  'metastasize)  tl) 
tl) 

;  (ht-insert  'c  (list  'old-aged-cell  'die)  tl) 


;; unused  transforms  ;;; 

(defun  cees-abound  (key-cell  grid) 

(let  ((neighbor  (west-cell  key-cell  grid))) 

(and  neighbor  (eq  (cell-type  neighbor)  'c)  (eq  (cell-type  key-call) 
'c) 

(list  neighbor 

key-cell)))) 

globals  ;;;;;;;  (setq  directions  (list 
(make-location  01) 

(make-location  -1  0) 

(make-location  0-1) 

(make -location  1  0)))  (rplacd  (la6t  directions) 
directions)  (setq  north  (nth  1  directions) 
west  (nth  2  directions) 
south  (nth  3  directions) 
east  (nth  4  directions)) 

(defun  block-a-cells  () 

(copy-the-damn-thing 

'((10  c  (0  0)  1)  (24  a  (0  1)  2)  (24  a  (0  -1)  2)  (24  a  (-1  0)  2) 

(24  a  (1  0)  2)  (24  a  (1  1)  2)  (24  a  (1  -1)  2)  (24  a  (-1  1)  2) 

(24  a  (-1  -1)  2)))) 


This  appendix  contains  the  questionnaire  used  in  the  study 
(described  earlier  in  this  document)  of  how  people  understand 
programs. 


DEBUGGING  EXPERIMENT  POST-MORTEM 


We've  taped  your  ramblings  as  you  debugged  the  program.  We've  harassed 
you  with  questions  as  you  tried  to  work.  We've  got  the  copy  of  the 
listing  that  you  marked  up.  We've  taken  notes  on  what  we  saw  you  doing. 
Now. . . 

These  questions  are  to  be  answered  immediately  after  you  have  completed 
the  debugging  task.  Try  to  answer  them  as  completely  and  accurately  as 
possible.  This  is  our  last  chance  to  figure  out  what  you  thought  you 
were  doing  as  you  debugged  the  program. 

1)  What  questions  did  you  ask  about  the  program's  structure 

and  design? 


2)  What  sort  of  vocabulary  did  you  use  to  refer  to  objects 

in  the  program,  and  the  relations  between  them? 


3) 


What  sort  of  hypotheses  did  you  construct,  and  how  did  you 
evaluate  them? 


4) 


What  aids  for  searching  through  the  program  would  you  have  liked 


5)  Do  you  have  any  comments  about  the  format  of  this  experiment? 

Please  vent  your  spleen  here: 

al  Suggest  types  of  additional  program  documentation? 

bj  Would  you  like  notes  from  author  on  program's  intent? 

c]  Would  labels  that  warn  you  about  outdated  code  help? 

dj  Your  gripe  here... 


6)  Programmers  often  find  themselves  in  the  situation  of  having  to 

maintain  systems  about  which  they  know  little.  This  experiment  was  an 
attempt  to  simulate  Chat  experience.  We  are  in  the  process  of  defining 
a  tool,  called  the  PRL,  to  aid  in  program  comprehension.  We  are 
soliciting  your  suggestions  for  such  a  tool,  and  your  evaluation  of  our 
vision  of  the  PRL. 

Please  look  over  the  lists  below.  In  it  ve  have  presented  our  breakdown 
of  the  classes  of  objects  and  relations  you  might  want  to  talk  about  in 
analyzing  a  program.  Would  it  be  useful  to  be  able  to  search  for  these 
types  of  objects  and  relations?  How  natural  is  the  vocabulary?  Feel 
free  to  suggest  synonyms  or  rephrasings  you  find  more  natural.  Also 
please  add  any  useful  concepts  you  think  we  have  left  out. 


i 


A)  What  types  of  objects  would  you  like  to  be  able  to  search  for? 
Text  Strings 


Syntactic  Analysis 
Variable 
Funct ion 
Let 
Loop 
Exits 


-  "Show  the  exits  from  the  splice-in 
loop  of  function  F00.  " 


Cliches 


List-traversal 

Ordered-list 

Spl ice-in 

Priority-queue 

Enqueue-operation 

Dequeue-operation 

Hash-table 

Production-system 

Pattern  /  Trigger 

Action  /  Transformation 


B)  What  types  of  relations  are  worth  talking  about? 

Functional  Composition 
Calls 
Called-by 
Recursive 

Mutual ly-recursive 
Main-loop 

Top-level -subroutine 


Control  Flow 

(Sometimes/Always)  Calls 
(Sometimes/Always )  Returns 


-  'list  the  functions  that 
function  F00  always  calls." 


Data  Flow 

(Sometimes/Always)  Accesses  -  'list  the  variables  always 

accessed  by  function  FOO. " 

(Sometimes/Always)  Changes  -  "Find  the  variables  sometimes 

changed  by  function  FOO,  and 
call  it  FOOL  1ST." 

(Sometimes/Always)  Side-effects 


C)  What  forms  of  documentation  would  be  especially  helpful? 

Main -routine 

Data-structure 

Input 

Output 

Side-effect 

Precondition 

Assumptions 

Intentional-annotation  ■  Collects  segments  of  code  that 

implement  some  particular  purpose. 
Hook  ■  A  comment  describing  why  some  code, 

not  presently  used,  was  designed  in 
to  facilitate  some  future  expansion. 
■  A  history  of  revisions. 


Inactive-code 


;ruu  Che  filter  function  aaaociated  with  each  candidate. 

;if  it  aucceeda,  apply  the  tranaforn  to  the  bindinga  returned  by  the 


A0-A142  224  AN  INFORMAL  STUDY  OF  PROGRAM  COMPREHENSION(U)  ADVANCED  %  3 
INFORMATION  AND  DECISION  SYSTEMS  MOUNTAIN  VIEW  CA  ^ 

E  A  DOMESHEK  ET  AL .  MAR  84  AI/DS-TM- 1014-3 
UNCLASSIFIED  AFQSR-TR-84-0309  F49620-81 -C-0067  F/G  12/1  NL 


HE 


UT1  CLASSIFIED _ 

SECURITY  CLASSIFICATION  OF  THIS  PAGE 


M<  REPORT  SECURITY  CLASSIFICAT  ION 

'  ••’■’?T.ASSiriED _ 

CURlTY  CLASSIFICATION  AUTHORITY 

2D  DECLASSIFICATION  DOWNGRADING  schedule 


REPORT  DOCUMENTATION  PAGE 

|  1 D  RESTRICTIVE  MARKINGS 


3  Distribution/availability  of  report 

Approved  for  public  release;  distribution 
unlimited. 


4  PERFORMING  ORGANIZATION  REPORT  N  UMBER  .S) 

TM-1014— 3 


5  MONITORING  ORGANIZATION  REPORT  NUMBERIS) 

SFOSR-TR.  8  4  -0309 


da  NAME  OF  PERFORMING  ORGANIZATION  6d  OFFICE  SYMBOL  7a  NAME  OF  MONITORING  ORGANIZATION 

Advanced  Information  and  Air  -orcc  0fficc  of-  Scientific  Research 

Decision  Systems _ _ 

6c  ADDRESS  fCif>.  Slalc  and  /IP  Code!  7b.  ADDRESS  iCily.  Sfaf*  and  ZIP  Code} 

201  San  Antonio  Circle,  Suite  286  Directorate  of  Mathematical  L  Information 

Mountain  View  CA  94040-1270  Sciences,  Bolling  AFB  DC  20332 


8*  NAME  OF  FUNDING/SPONSORING 
ORGANIZATION 

_ AFOSR _ 

8c  ADDRESS  fCity,  State  and  ZIP  Code t 


8b  OFFICE  SYMBOL  9.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 
(If  applicable > 

_ f 49620-81 -C-0067 _ 

10  SOURCE  OF  FUNDING  NOS. 


Bolling  AFB  DC  20332 

11.  title  (Include  Security  Classification/ 

AN  INFORMAL  STUDY  OF  PROGRAM  COMPREHENSION, 

12.  PERSONAL  AUTHORIS) 

•  Rnian  P.  McCune 

'VPE  OF  REPORT  ^  j  13b.  TIME  CCVEREO 


element  no. 

61102F 


PROJECT 

TASK 

NO 

NO 

2304 

A7 

WORK  UNIT 
NO 


[14  OATE  OF  REPORT  I  Vr  .  Mo  .  Day  l  fl5  PAGE  COUNT 


from  1/6/82  tq31/5/83  I  MAR  1984 


|  16.  suppleme.y Yah-v -not at f On 


COSATI  CODES 


FIELD 

1  GROUP  1  SUB  SR 

16  SUBJECT  TERMS  'Continue  on  re  verge  tf  necessary  and  identify  by  block  number  i 


IS  ABSTRACT  'Continue  on  reverse  if  necessary  and  iorntif\  by  block  numberi 

During  this  period,  the  four  investigators  produced  11  papers  with  titles  including, 

"The  intelligent  program  editor  -  A  knowledge  based  system  for  supporting  program  and 

documentation  maintenance,"  "The  multipurpose  presentation  system,"  "Rule  based  information 
retrieval,"  "Incremental  informal  program  acquisition,"  and  "Results  in  knowledge  based 
program  synthesis." 


20  DISTRIBUTION/AVAILABIUT  y  of  abstract 


IAME  OF  RESPONSIBLE  INOIVIOUAL 

*/i'.  Robert  N.  Buchal 


OO  FORM  1473.  83  APR 


s:  abstract  security  classification 


'•■'classifieo/unlimited  E  same  as  rpt.  C  otic  users  □  -/.CDIFIED 


22u  HlEPHONt  NUMBER 
l Include  Area 

(  CY: }  767- 


ED  TlON  or  1  JAN  73  IS  OF.Ski.CT  t 


I  22c  OFFICE  SYMBOL 

NM 

_UIlfL/.2iLLEI£D _ 

SC  Cl'R'T  Y  CLASSIF  ICAT  ION  C* 


I  V  , 


AD-A142  224  AN  INFORMAL  STUDV  OF  PROGRAM  COMPREHENSION(U)  ADVANCED 
INFORMATION  AND  DECISION  SVSTEMS  MOUNTAIN  VIEW  CA 
E  A  DOMESHEK  ET  AL.  MAR  84  AI/DS-TM-1014-3 
UNCLASSIFIED  AFOSR-TR-84-0309  F49620-81-C-0067  F/G  12/1 


35 


» 


I 


E 


i 


■ 


kS 

R 


•* 

I 

\ 

r, 

r. 


MICROCOPY  RESOLUTION  TEST  CHART 
NATIONAL  BUREAU  OF  STANOAROS-1963-A 


CA 


4 


SI  CURiT  V  CL  A$$|F  *CA  T  ION  0>  7  MIS  PAGE 


A£PO«T  SECUHiTv  CLASSIFICATION 

UNCLASSIFIED 


REPORT  DOCUMENTATION  PAGE 


M  mi  T  «  Vl  MAIIK.IUC, 


,<  » '  •»  i  Mitnn  uujiA  vaii  a  n  I  L  I  7  v  of  uiconr 

Approved  for  public  release; 
distribution  unlimited 


■j  MnM  I'iinfJU  Om./.NIZAIION  NuMBER(S) 

ylp3SR.TR-  84-0  309 


•  StCUUlTV  (;lA5S'I  ICATION  AUTHORITY 


b  DECLASSIFICATION  DC  VNNGn  AOING  SGML  DULL 


4  PE  RFC  HMING  ORGANISATION  ftCPOMT  NUMBER'S) 

TM-1014-3 


„»  NAME  Of  fERfCHMlNC  ORGANIZATION 

Advanced  Information  & 
Decision  Systems 


6c.  ADDRESS  iCilv  Male  »..i 1  /.Il  Ladrl 

201  San  Antonio  Circle,  Suite  286 
Mountain  View,  CA  94040 


Uli  UR  P I  Cl  'j  y  MliOL 
ill 

FQ8671 


8t  ADORESS  ICiI).  Mali-  o/r./  /.If  <nfB?£arCI1 

Building  410 

Bolling  AFB ,  D.C.  20332 


11  Title  ifncluCt  Jitriinly  Clomftcohonl 

An  Informal  Study  of  Program  Comprehension 


13  RERSONAl  AuThOIHSI 

Domeshek,  Eric  A.;  Shapiro,  Daniel  G. ;  Dean,  Jeffrey  S.;  McCune,  Brian  P. 


13a.  TYRE  Of  REPORT  |  Ub.  TIME  COVERED  1-1  n,l  1  i  jl  Illl-OUl  •  V  i  .  .'In  L/f,l  10  RACE  COUNT 

rrom  l_J.une  8 2 t o 3 1  May  8B  March  1984  62 


16  supplementary  nc"ati 


WORK  UNIT 
NO 


COSAT  \  CODES 


18  SUBJECT  TE  RMS 


if  ncu  and  identify  6)  bl»‘*fk  numberi 


iifio  group _ sue  gr _  Program  Reference  Language  (PRL),  Extended  Program  Model 

_  (EPM),  Intelligent  Program  Editor  (IPE),  program 

_ _ documentation,  artif  icial  intelligence  (AI)  ,  knowledge  base, 

19  ABSTRACT  iCunfmur  on  mer$r  if  nrccuoo  and  identify  by  bl>uk  nu»nb*ri 

This  report  describes  work  performed  during  the  second  year  of  research  on  a  Program 
Reference  Language.  During  this  year,  a  study  was  conducted  in  which  protocols  of 
programmers  studying  a  new  program  (with  the  intent  of  debugging  it)  were  analyzed,  both 
for  the  vocabulary  used  and  for  indications  of  strategies  adopted  in  their  efforts  at 
program  comprehension.  A  sampling  of  programmers'  natural  vocabulary  for  referencing 
programs  was  gathered  and  analyzed.  Preliminary  steps  were  taken  towards  using  this  data 
as  the  basis  for  the  design  of  a  formal  query  language  for  the  PRL.  The  study  also  raised 
some  new  issues  bearing  on  the  implementation  of  systems  which  use  the  PRL:  individual 
differences  imply  the  need  for  customization;  context-sensitive  information  management 
is  important;  and  useful  user  interface  features  were  identified. 


0  FORM  1473,83  APR 

N  vS  V  a. 


edition  of  i  jan  7j  is  i  n 


SfCl'HITV  Cl  A  *»S  IF  i  C  A  T  I  ON  CF  THIS  PAGE 


V 


*JV  •  •  'V-  t  -  •  -  - 


END 


'  ip 

Pi 


P 

a 

y 


FILMED 


3-85 


t'  l 


■*  '  . 

pg 


ftvSj 

fc± 

•*  *.•  *-■ 

f*. 

Cs'.-v*- 

Ni 


|V'.  .1 

£*.V.‘.n  ‘ 


DTIC 


r * 


